Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora

نویسندگان

  • Ramy Eskander
  • Nizar Habash
  • Owen Rambow
چکیده

We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction of 55.6% over a simple baseline; our best method for German achieves a 66.7% error reduction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological Annotation of the Lithuanian Corpus

As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morphological annotation of the 100 million word Lith...

متن کامل

A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources

In this paper, we describe a resource-light system for the automatic morphological analysis and tagging of Russian. We eschew the use of extensive resources (particularly, large annotated corpora and lexicons), exploiting instead (i) pre-existing annotated corpora of Czech; (ii) an unannotated corpus of Russian. We show that our approach has benefits, and present what we believe to be one of th...

متن کامل

Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic

We present new language resources for Moroccan and Sanaani Yemeni Arabic. The resources include corpora for each dialect which have been morphologically annotated, and morphological analyzers for each dialect which are derived from these corpora. These are the first sets of resources for Moroccan and Yemeni Arabic. The resources will be made available to the public.

متن کامل

Do LSTMs really work so well for PoS tagging? - A replication study

A recent study by Plank et al. (2016) found that LSTM-based PoS taggers considerably improve over the current state-of-theart when evaluated on the corpora of the Universal Dependencies project that use a coarse-grained tagset. We replicate this study using a fresh collection of 27 corpora of 21 languages that are annotated with fine-grained tagsets of varying size. Our replication confirms the...

متن کامل

A Cross-language Approach to Rapid Creation of New Morpho-syntactically Annotated Resources

We take a novel approach to rapid, low-cost development of morpho-syntactically annotated resources without using parallel corpora or bilingual lexicons. The overall research question is how to exploit language resources and properties to facilitate and automate the creation of morphologically annotated corpora for new languages. This portability issue is especially relevant to minority languag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013